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Description 

Background and Summary of the Invention 

[0001] The present application relates to wireless 
communication, and more particularly to turbo decoding 
and the like. 



have to reach a decision on its first pass, but is generally 
allowed to iteratively improve the estimates of the infor- 
mation bits until convergence is. achieved. 

5 Background: MAP Decoders 



Background: Error Correction 

[0002] Coded digital communication systems use er- 
ror control codes to improve data reliability at a given 
signal-to-noise ratio (SNR). For example, an extremely 
simple form (used in data storage applications) is to gen- 
erate and transmit a parity bit with every eight bits of 
data; by checking parity on each block of nine bits, sin- 
gle-bit errors can be detected. (By adding three error- 
correction bits to each block, single-bit errors can be de- 
tected and corrected.) In general, error control coding 
includes a large variety of techniques for generating ex- 
tra bits to accompany a data stream, allowing errors in 
the data stream to be detected and possibly corrected. 

Background: Trellis Coding 

[0003] One of the important techniques for error con- 
trol is trellis coding. In this class of techniques some con- 
straints are imposed on the sequence of symbols, so 
that certain symbols cannot be directly followed by oth- 
ers. The constraints are often defined by a geometrical 
pattern (or "trellis") of allowed and disallowed transi- 
tions. The existence of constraints on the sequence of 
symbols provides some structure to the data sequence: 
by analyzing whether the constraints have been violat- 
ed, multiple errors can be corrected. This is a very pow- 
erful class of coding techniques; the constraint geome- 
try can be higher dimensional, or algebraic formulations 
can be used to express the constraints, and many var- 
iations can be used. 

Background: Turbo Coding 

[0004] The encoder side of a turbo coding architec- 
ture typically uses two encoders, one operating on the 
raw data stream and one on a shuffled copy of the base 
data stream, to generate two parity bits for each bit of 
the raw data stream. The encoder output thus contains 
three times as many bits as the incoming data stream. 
This "parallel concatenated encoder" (or "PCE") config- 
uration is described in detail below. 
[0005] The most surprising part of turbo coding was 
its decoding architecture. The decoder side invokes a 
process which (if the channel were noiseless) would 
merely reverse the transformation performed on the en- 
coder side, to reproduce the original data. However, the 
decoder side is configured to operate on soft estimates 
of the information bits and refines the estimates through 
an iterative reestimation process. The decoder does not 



[0006] MAP decoding is a computationally intensive 
technique, which has turned out to be very important for 
turbo decoding and for trellis-coded modulation. "MAP" 
io stands for "maximum a posteriori": a MAP decoder out- 
puts the most likely estimate for each symbol in view of 
earlier AND LATER received symbols. This is particu- 
larly important where trellis coding is used, since the es- 
timate for each symbol is related to the estimates for 
'5 following symbols. 

[0007] By contrast, a maximum-likelihood ("ML") de- 
coder tries to compute the transmitted sequence for 
which the actually received sequence was most likely. 
These verbal statements may sound similar, but the dif- 
20 ference between MAP and ML decoding is very signifi- 
cant. ML decoding is computationally simpler, but in 
many applications MAP decoding is required. 
[0008] MAP decoding normally combines forward- 
and back-propagated estimates: a sequence of re- 
2s ceived symbols is stored, and then processed in one di- 
rection (e.g. forward in time) to produce a sequence of 
forward transition probabilities, and then processed in 
the opposite direction (backward in time) to produce a 
sequence of backward transition probabilities. The net 
*o estimate for each symbol is generated by combining the 
forward and backward transition probabilities with the 
data for the signal actually received. (Further details of 
this procedure can be found in OPTIMAL DECODING 
OF LINEAR CODES FOR MINIMIZING SYMBOL ER- 
35 ROR RATE, Bahl, Cocke, Jelinek, and Raviv, IEEE 
Transactions on Information Theory, 1974, which is 
hereby incorporated by reference.) 
[0009] The combination of forward and backward 
computation requires a substantial amount of memory. 
*o Since the blocks in advanced cellular communications 
can be large (e.g. 5120 symbols), the memory required 
to store a value for each possible transition for each 
symbol in a biock is large.. To reduce the memory re- 
quirements during decoding, each block of data may be 
45 divided into many smaller blocks (e.g. 40 blocks of 128 
symbols) for MAP decoding. 

[0010] The trellis encoding is done on a complete 
block of data, so that starting and ending states are 
known for the complete block. However, the starting and 
50 ending states are not know for the intermediate blocks. 
This presents a problem for accurate process of these 
smaller blocks, but it has been found that simply iterating 
the forward estimation process for a few symbols before 
the start of each block will ensure that processing of the 
55 first symbol in the block starts from a good set of initial 
values. 
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MAP Decoding with Pipelined Windowed Processing 

[001 1 ] The present: application discloses a technique 
for sub-block processing, in a MAP decoding, which us- 
es pipelining. Processing of alphas is begun, in parallel 5 
with processing of betas. Preferably each stage of 
processing is further internally parallelized; but the 
pipelining of forward-propagated processing with back- 
propagated processing provides an additional degree of 
net improvement in throughput. 10 
[0012] Advantages of the disclosed methods and 
structures, in various embodiments, can include one or 
more of the following: 

Faster processing; 15 
less memory; 

more iterations possible in a turbo decoder. 
Brief Description of the Drawings 

20 

[001 3] The disclosed inventions will be described with 
reference to the accompanying drawings, which show 
important sample embodiments of the invention and 
which are incorporated in the specification hereof by ref- 
erence, wherein: 25 

Figure 1 shows a block diagram of a turbo decoder. 
Figure 2 shows a block diagram of a MAP decoder 
that uses parallel sliding window processing. 
Figure 3 shows a block diagram of the beta gener- 30 
ation block within the MAP decoder. 
Figure 4 shows a block diagram of the alpha gen- 
eration block within the MAP decoder. 
Figure 5 shows a block diagram of the extrinsic gen- 
eration block within the MAP decoder. 35 
Figure 6 is a timing chart of the pipelining within the 
beta block. 

Figure 7 shows the timing offset between genera- 
tion of alpha and beta sliding window blocks. 
Figure 8 shows the correspondence between the *o 
alpha and beta sliding window blocks, with prologs. 
Figure 9 shows an example of the order in which 
beta and alpha bits are processed. 

Detailed Description of the Preferred Embodiments 

[0014] The numerous innovative teachings of the 
present application will be described with particular ref- 
erence to the presently preferred embodiment. Howev- 
er, it should be understood that this class of embodi- so 
ments provides only a few examples of the many advan- 
tageous uses of the innovative teachings herein. In gen- 
eral, statements made in the specification of the present 
application do not necessarily delimit any of the various 
claimed inventions. Moreover, some statements may ss 
apply to some inventive features but not to others. 
[0015] Concurrent operation of system hardware al- 
lows simultaneous processing of more than one basic 



operation. Concurrent processing is often implemented 
with two well known techniques: parallelism and pipelin- 
ing. 

[0016] Parallelism includes replicating a hardware 
structure in a system. Performance is improved by hav- 
ing multiple structures execute simultaneously on differ- 
ent parts of a problem to be solved. 
[0017] Pipelining splits the function to be performed 
into smaller pieces and allocates separate hardware to 
each piece. More information on parallelism and pipelin- 
ing can be found in the The Architecture of Pipelined 
Computers," by Kogge, which is hereby incorporated by 
reference. 

[0018] Figure 1 shows a block diagram of a turbo de- 
coder. Two main blocks, the turbo controller 102 and the 
MAP decoder 104, are shown. The turbo controller 102 
stores the data streams (X, the systematic data 106; P, 
the parity data 108; and A, the A PRIORI data 110) that 
serve as input for the MAP decoder 104 and controls 
the order in which the data is input in the MAP decoder 
1 04. The diagram shows the three data streams being 
input twice each in the MAP decoder 1 04. Two separate 
sets of input data are required because the alpha and 
beta generation blocks require the data inputs in reverse 
order. The extrinsic output of the MAP decoder 104 is 
returned to the controller 102 for another decoding iter- 
ation. 

[001 9] Figure 2 shows a block diagram of a MAP de- 
coder 104 that uses parallel sliding window processing. 
A MAP decoder 1 04 receives the scaled systematic data 
signal 106, the scaled parity data signal 1 08, and the A 
PRIORI signal 110 as its input. There are N number of 
X signals 106, where N is the size of the interieaver. N 
X signals 106 are applied for each beta 208 and alpha 
21 0 state vector, which are the respective outputs of the 
beta 202 and alpha 206 blocks. During beta generation, 
X 1 06 is applied in reverse order, and during alpha gen- 
eration, X 1 06 is applied in forward order. There are also 
N number of P signals 108. N P signals 108 are applied 
for each alpha 210 and beta 208 vector. During beta 
generation, P 1 08 Is applied in reverse order, and during 
alpha generation it is applied in forward order. The A 
PRIORI 110 is either the Interleaved or deinterleaved 
extrinsic data from the previous MAP decoder opera- 
tion. There are N A PRIORI signals 1 1 0, and one A PRI- 
ORI 1 1 0 is applied for each beta 208 and alpha 21 0 vec- 
tor. A PRIORI 110 is applied the same directions as X 
106 and P 108 for beta and alpha generation. 
[0020] The beta generation section 202, shown in 
more detail in Figure 3, receives inputs X 106, P 108, 
and A 11 0. It generates the beta state vector 208, which 
is stored in beta RAM 204. The alpha generation section 
206 receives inputs X 106, P 108, and A 110 (but in re- 
verse order relative to the beta input). The alpha gener- 
ation block 206, shown in greater detail in Figure 4, gen- 
erates the alpha state vector 21 0. The outputs 208, 21 0 
of the alpha and beta generation sections serve as in- 
puts for the extrinsic generation section 212, shown in 
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Figure 5. These data streams must be properly se- 
quenced with the parity stream P 1 08 before being input 
to the extrinsic section 212. 

[0021] Figure 3 shows the beta generation stages. 
First, during the MAP reset state, the registers are set s 
to their initial conditions for the beta state vector 208 
The beta signal 208, X 106, P 108, and A 110 are 
summed together by th e adder tree 302 according to the 
trellis used to encode the data. (In the preferred embod- 
iment, an 8 state trellis is used). The results are stored 10 
in registers 310. In the second stage, the results of the 
adder 302 are applied to the 8 MAX* blocks 304 and 
are then stored in the MAX* registers 312. Next, the un- 
normalized outputs advance to two separate normaliza- 
tion stages 306, 308, each of which has a register 314, is 
316 to store its results. Thus the total process has 4 
stages within the feedback loop of the beta generation 
block 202, which require 4 clock cycles to complete. This 
latency (the 4 clock cycles) determines the level of 
pipelining available. 20 
[0022] The alpha generation section 206 is shown in 
Figure 4. First, the registers are set to their initial condi- 
tions. Then the data inputs are summed together by the 
adder 402, and the results are stored in registers 412 
These are then input to the MAX* blocks 406 and stored ss 
in MAX* registers 414. The alpha generation section 206 
also has two normalization stages 408, 410, each with 
their own registers 41 6, 41 8. The latency of the alpha 
generation stage 206 is thus 4, allowing 4 levels of 
pipelining to be implemented. 30 
[0023] Operating in parallel with the alpha generation 
section is the extrinsic generation section 212, shown 
in Figure 5. Alpha 21 0, beta 208, and P 1 08 are summed 
together by the adders 502 according to the trellis used, 
and the results are stored in registers 51 0. In the second 35 
stage, these results are applied to the MAX* blocks 504 
and then stored in the MAX* registers512. These results 
are again applied to MAX* blocks 504 and then stored 
in registers 508. The result is summed and stored in an- 
other register 5 1 4, and the output is the extrinsic signal 40 
214. 



Parallelism with Sliding Windows 

[0024] A sliding window approach basically consists 4s 
of dividing the N sized block of incoming data into sev- 
eral smaller blocks. Each of these smaller blocks is 
called a sliding window block. These sliding window 
blocks are MAP decoded each independently, with a 
prologforboththealphaandbetavectors.Thedecoding so 
for the individual alpha and beta sliding window blocks 
is done in parallel. Since the initial conditions are not 
known for the individual sliding window blocks, the pro- 
logs are used to reach a good set of initial values. 
[0025] By starting the update of the alpha at a point ss 
sufficiently Inside the previous block and starting the up- 
date of the beta at a point sufficiently inside the next 
block, the decoder can "forget" the initial conditions and 



converge before it begins operating on the actual data, 
the prolog section size used is generally 3 or 4 times the 
number of states in the trellis. The first alpha and last 
beta sliding block will originate from a known state, and 
the size of their respective prolog sections will be 3 for 
an 8 state trellis (for example). 

[0026] The innovative alpha prolog allows parallel 
processing of both the alpha and beta sliding window 
blocks of data. Depending on the specific implementa- 
tion used, each update of alpha or beta takes a few clock 
cycles to run (4 clock cycles in the above embodiment). 
This latency determines the degree of pipelining possi- 
ble in the system. In the preferred embodiment, there 
are four levels of pipelining within each alpha and beta 
block (meaning the data within each of the alpha and 
beta generation stages is pipelined, or broken into sep- 
arate sets of data and independently operated on by 
successive stages within the beta generation section). 
There is also a degree of parallelism between the alpha 
and beta blocks themselves, meaning these two sec- 
tions operate simultaneously to produce extrinsic input. 
[0027] The alpha and beta vector generation process- 
es are divided into multiple stages, as shown above. 
These stages are within the iteration loops of the alpha 
and beta vector generation, shown in Figures 3 and 4. 
The number of stages would be equal to the latency for 
a particular architecture. In the preferred embodiment, 
these stages are the Adder, the MAX*, and two Normal- 
ization stages. The latency of these stages dictates the 
degree of parallel processing possible. For example, in 
the preferred embodiment this latency is 4, meaning 4 
sliding-window blocks can be processed in parallel. 
Thus, 4 sliding-window blocks make up one sub-block. 
[0028] The pipelining of the sliding blocks is shown in 
Figure 6. During the first clock cycle, betaO (the first slid- 
ing block) enters the adder stage. In the second clock 
cycle, betaO enters the MAX' stage, and betal enters the 
adder stage. In the third clock cycle, betaO enters the 
first normalization stage (the third stage of beta gener- 
ation), betal enters the MAX* stage, and beta2 enters 
the adder stage. Next, betaO enters the second normal- 
ization stage, betal enters the first normalization stage, 
beta2 enters the MAX* stage, and beta3 enters the 
adder stage. The intermediate values for each stage are 
stored in registers, as shown above. 
[0029] Either the beta or alpha stages are stored in 
memory so that the data input to the extrinsic section 
can be synchronized. In the preferred embodiment, beta 
processing begins one sub-block before alpha process- 
ing (Note that this staggering could be eliminated by 
adding another RAM block to store the alpha outputs.) 
This staggering is shown in Figure 7. The first sub-block 
(which is a number of sliding blocks equal to the latency 
of the architecture-4 in the preferred embodiment) of 
the beta section can be processed while the alpha sec- 
tion is idle. Next, the second set of sliding blocks of the 
beta section (i.e., the second sub-block) is processed 
while the first set of sliding blocks of the alpha section 
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are processed. The extrinsic sections are processed in 
parallel with the alpha section. This reduces the memory 
requirement for storing both the alpha and beta state 
vectors because the alpha outputs can be directly ap- 
plied to the extrinsic as they are generated. Since the 
extrinsic generates output (and requires input) one sub- 
block at a time, the beta RAM only needs to store one 
sub-block of data at a time. (Note that the alpha and beta 
processing could be reversed. This would require the 
alpha outputs to be stored in memory, and beta and the 
extrinsic blocks would run in parallel.) 
[0030] Figure 8 shows the correspondence between 
alpha and beta slidihg-window blocks. The entire data 
block consists of N symbols plus a number of tail bits. 
This block is broken into sub-blocks, which are further 
divided into sliding-window blocks. One sliding-window 
block is processed per clock cycle. Each sliding-window 
block includes a prolog. The beta prologs consist of sev- 
eral symbols to the right of the sliding window. The alpha 
prolog consists of the several bits to the left of the sliding 
window. This is shown by the overlap between succes- 
sive sliding blocks in the figure. Each beta sliding win- 
dow is processed in reverse relative to the alpha sliding 
blocks. 

[0031] Figure 9 shows an example of the order in 
which beta and alpha bits are processed. This example 
assumes a sliding window size of 100, a prolog length 
of 24, and 4 sliding windows per sub-block. The sliding 
block betaO begins at the start of the prolog at bit 123. 
Next, the prolog ends at bit 1 00. The reliability data be- 
gins at bit 99, and ends at bit zero. The alpha sliding 
blocks are similarly divided. (Note the first two entries 
for alpha do not exist, because there is no prolog for the 
beginning of the block since the start and end points are 
known.) 

[0032] The extrinsic cannot be processed in parallel 
with both the alpha and beta generation processes, be- 
cause the extrinsic input data, which requires data from 
alpha, beta, and the parity data, must be input in a cer- 
tain order. The following shows the indexing of the ex- 
trinsic input. EO (corresponding to alphaO and betaO) 
goes from bit 0 to 99. E1 goes from 100 to 199, and so 
on, given a sliding window size of 100. The input re- 
quired by this example would be as follows. In the first 
clock cycle, the soft estimate data relating to bit 0 from 
alpha, beta, and P are input to the extrinsic. In the sec- 
ond clock cycle, data associated with bit 100 from the 
three inputs is required. In the third clock cycle, the data 
associated with bit 200 is required. In the fourth clock 
cycle, the data associated with bit 300 is required. In the 
fifth clock cycle, the input reverts back to the data asso- 
ciated with bit 1 (the first clock cycle input shifted one 
bit). In the next cycle, the bit 101 data, and so on. Thus 
the betas must be stored in RAM after they are gener- 
ated, because they are generated in a different order 
than the alpha bits and parity bits, and are not required 
at generation as are the alpha bits and parity bits. When 
the corresponding alphas and betas have been gener- 



ated, the extrinsic may be calculated. 
Definitions: 

s [0033] Following are short definitions of the usual 
meanings of some of the technical terms which are used 
in the present application. (However, those of ordinary 
skill will recognize whether the context requires a differ- 
ent meaning.) Additional definitions can be found in the 
10 standard technical dictionaries and journals. 

[0034] MAX*: MAX* is a maximum finding approxima- 
tion for the natural log function, given by the following 
equation: 

\n[e A +e B ] = MAX*= MAX(A+B)+f(\A-B\) 

where f(A-B) is a correction term. A lookup table is usu- 
ally used for this value, which makes the above expres- 
sion an approximation. If the expression 



is used instead of a lookup table, then the MAX* defini- 
tion becomes an exact equality, not an approximation'. 
MAP decoder: Maximum A-Posteriori. MAP decoders 
use a detection criterion that leads to the selection of "x 
that maximizes the probability p(x/r) of a symbol x given 
the received information r. 

Extrinsic: Outputs of decoders that estimate the value 
of a decoded bit. Extrinsics are usually soft esti- 
mates. 

Modifications and Variations 

[0035] As will be recognized by those skilled in the art, 
the innovative concepts described in the present appli- 
cation can be modified and varied over a tremendous 
range of applications, and accordingly the scope of pat- 
ented subject matter is not limited by any of the specific 
exemplary teachings given, but is only defined by the 
issued claims. 

[0036] Though the preferred embodiment is given in 
specific detail, many alterations can be made in its im- 
plementation without escaping the scope of the inven- 
tive concepts herein disclosed. For instance, the latency 
of each state vector generation stage can be varied (by 
adding registers, or other means), and thus the degree 
of possible pipelining will vary. The size of the trellis can 
also be changed without altering the inventive concepts 
applied in the embodiment. The betas, alphas, and ex- 
trinsics may be generated in various parallel combina- 
tions, with only minor changes in RAM storage required. 
[0037] Those of skill in the art will know that the defi- 
nitions of inputs used in the present application (the sys- 
tematic data X, and the parity data P) may be general- 
ized to cover a broader range of applications. For in- 
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stance, these inputs may differ in such applications as 
MAP equalization or turbo trellis decoding. In some ap- 
plications, the inputs may not be soft estimates of bits, 
but rather they may be soft estimates of other variables.' 
The disclosed innovations are intended to cover ail such 5 
variations in implementation. 

[0038] The disclosed innovations of the present appli- 
cation are applicable to any MAP architecture. For in- 
stance, any implementation of the disclosed inventive 
concepts in turbo decoders which use MAP decoders is w 
within the contemplation of the invention. Any MAP op- 
erations, e.g., MAP equalization, are within the contem- 
plation of the present application. MAP equalization is 
the process of describing the channel function as the 
data input to the channel constrained on a trellis to pro- 15 
duce the observed output. The input to the channel can 
then be estimated in a maximum a priori sense by ap- 
plying a MAP decode to the trellis diagram and the ob- 
served channel output. This is useful if (a) soft output is 
required from the equalizer, (b) a more accurate esti- 
mate of the input to the channel is required than can be 
got using a linear filter or equalizer, or (c) an iterative 
joint decode of the channel and the applied FEC is re- 
quired. In general, MAP finds use in any situation where 
the data observed is known to have been generated by 
input to a linear trellis. 

[0039] Likewise, MAP architectures with software, as 
well as hardware, implementations is within the contem- 
plation of the invention. In today's DSPs very high 
processing rates are achieved by using deep pipelining 30 4. 
of the data path. This means the DSP cannot be effi- 
ciently used in a feedback process such as beta and 
alpha updates. Using the present invention allows sev- 
eral blocks to be simultaneously processed by the DSP 
in a pipelined fashion, which considerably speeds up the 35 5 
operation in a deeply pipelined DSP architecture. 
[0040] Further background material on the state of the 
art in MAP decoders and coding can be found in TURBO 
CODING, by Heegard and Wicker; TRELLIS CODING 
by Schlegel; ERROR CONTROL SYSTEMS, by Wicker' 40 6 
and AN INTUITIVE JUSTIFICATION AND A SIMPLI- 
FIED IMPLEMENTATION OF THE MAP DECODER 
FOR CONVOLUTIONAL CODES, Andrew Viterbi.lEEE 
Journal on Selected Areas of Communications, Vol. 16 
No. 2, February 1998, all of which are hereby incorpo- 45 
rated by reference. 
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Claims 

1 . A MAP decoding method, comprising the steps of: 

performing a first sliding window operation in a 
first direction on at least a partial block of data, 
to thereby obtain first derived parameters; 
performing a second sliding window operation 
in a second direction, which is opposite to said 
first direction, on at least a partial block of said 



so 



55 



data, to thereby obtain second derived param- 
eters; and 

processing said first and second derived pa- 
rameters, to thereby generate data estimate 
values; 

wherein said sliding window operations are pipe- 
lined with each other, to operate in parallel on dif- 
ferent respective portions of data. 

!. The method of Claim 1 , wherein the sliding window 
operations are each divided into separate stages, 
and the separate stages operate in parallel on dif- 
ferent partial blocks of data. 

. A method for bi-directionally processing a block of 
data, which does not necessarily have a known 
state at endpoints thereof, according to at least one 
sequencing constraint, comprising the steps of: 

sequentially processing data elements of the 
block in a first direction, after first processing 
prolog elements in said first direction in accord- 
ance with said sequencing constraint; and 
sequentially processing said data elements in 
a second direction, after first processing prolog 
elements in said second direction in accord- 
ance with said sequencing constraint. 

The method of Claim 3, wherein the processing of 
data elements in the first direction, and the process- 
ing of data elements in the second direction are 
done in parallel. 

The method of Claim 3, wherein each , step of 
processing data elements is divided into separate 
stages, and the separate stages operate in parallel 
on different data elements. 

A method for parallel MAP processing of a lattice- 
coded block of data 4 comprising the steps of: 
dividing the data into sliding window blocks, and, for 
each of multiple ones of said sliding window blocks, 

a) sequentially processing the elements of the 
respective sliding window block in a first direc- 
tion, after first processing prolog elements in 
said first direction in accordance with a se- 
quencing constraint; and 

b) sequentially processing the elements of the 
respective sliding window block in a second di- 
rection, after first processing prolog elements 
in said second direction in accordance with said 
sequencing constraint; 

wherein said steps a) and b) are performed at least 
partly in parallel with each other. 
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7. The method of Claim 6, wherein steps a) and/or b) blocks using alpha and beta prologs, 
are divided into separate stages, and the separate 

stages operate in parallel on different sliding win- 
dow blocks. 

5 

8. A method for parallel MAP processing, comprising 
the steps of: 

a) combining probability metrics in at least one 
adder tree; and io 

b) performing an maximum-finding operation to 
combine ones of said metrics which correspond 
to alternative possibilities; 

wherein said steps a) and b) are at least partly per- is 
formed in a parallelized pipeline relationship with 
each other. 

9. The method of Claim 8, wherein the maximum-find- 
ing operation is an exponent-logarithm equation. 20 

1 0. The method of Claim 8, wherein the maximum-find- 
ing operation is an estimation of an exponent-loga- 
rithm function. 

25 

11. A method for parallel MAP processing, comprising 
the steps of: 

a) combining probability metrics in at least one 
adder tree; 30 

b) performing a maximum-finding operation to 
combine ones of said metrics which correspond 
to alternative possibilities; 

c) performing a normalization operation on the 
results of said step b); 35 

wherein said steps a), b), and c) are at least partly 
performed in a parallelized pipeline relationship 
with each other. 



40 



12. The method of Claim 11, wherein the maximum- 
finding operation is an exponent-logarithm equa- 
tion. 



13. The method of Claim 11, wherein the maximum- <s 
finding operation is an estimation of an exponent- 
logarithm equation. 

14. A system for MAP processing of a data stream, the 
data stream being divided into sliding window so 
blocks, comprising: 

an alpha generation process; 
a beta generation process; 

55 

wherein the alpha generation process and the beta 
generation process are divided into multiple pipelin- 
ing stages to operate on multiple sliding window 
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